Semi-supervised learning for text classification using feature affinity regularization
نویسندگان
چکیده
Most conventional semi-supervised learning methods attempt to directly include unlabeled data into training objectives. This paper presents an alternative approach that learns feature affinity information from unlabeled data, which is incorporated into the training objective as regularization of a maximum entropy model. The regularization favors models for which correlated features have similar weights. The method is evaluated in text classification, where feature affinity can be computed from feature co-occurrences in unlabeled data. Experimental results show that this method consistently outperforms baseline methods.
منابع مشابه
Semi-supervised Learning of Naive Bayes Classifier with feature constraints
Semi-supervised learning methods address the problem of building classifiers when labeled data is scarce. Text classification is often augmented by rich set of labeled features representing a particular class. As tuple level labling is resource consuming, semi-supervised and weakly supervised learning methods are explored recently. Compared to labeling data instances (documents), feature labeli...
متن کاملOn Propagated Scoring for Semi-supervised Additive Models
In this paper, a semi-supervised modeling framework that combines feature-based (x) data and graph-based (G) data for classification/regression of the response Y is presented. In this semi-supervised setting, Y is observed for a subset of the observations (labeled) and missing for the remainder (unlabeled). The Propagated Scoring algorithm proposed for fitting this model is a semi-supervised fi...
متن کاملSemi-supervised Collaborative Text Classification
Most text categorization methods require text content of documents that is often difficult to obtain. We consider “Collaborative Text Categorization”, where each document is represented by the feedback from a large number of users. Our study focuses on the semisupervised case in which one key challenge is that a significant number of users have not rated any labeled document. To address this pr...
متن کاملExtension of TSVM to Multi-Class and Hierarchical Text Classification Problems With General Losses
Transductive SVM (TSVM) is a well known semi-supervised large margin learning method for binary text classification. In this paper we extend this method to multi-class and hierarchical classification problems. We point out that the determination of labels of unlabeled examples with fixed classifier weights is a linear programming problem. We devise an efficient technique for solving it. The met...
متن کاملEfficient Distributed Semi-Supervised Learning using Stochastic Regularization over Affinity Graphs
We describe a computationally efficient, stochastic graph-regularization technique that can be utilized for the semi-supervised training of deep neural networks in a parallel or distributed setting. We utilize a technique, first described in [13] for the construction of mini-batches for stochastic gradient descent (SGD) based on synthesized partitions of an affinity graph that are consistent wi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012